An efficient Dai-Yuan projection-based method with application in signal recovery

The Dai and Yuan conjugate gradient (CG) method is one of the classical CG algorithms using the numerator ‖gk+1‖2. When the usual Wolfe line search is used, the algorithm is shown to satisfy the descent condition and to converge globally when the Lipschitz condition is assumed. Despite these two advantages, the Dai-Yuan algorithm performs poorly numerically due to the jamming problem. This work will present an efficient variant of the Dai-Yuan CG algorithm that solves a nonlinear constrained monotone system (NCMS) and resolves the aforementioned problems. Our variant algorithm, like the unmodified version, converges globally when the Lipschitz condition and sufficient descent requirements are satisfied, regardless of the line search method used. Numerical computations utilizing algorithms from the literature show that this variant algorithm is numerically robust. Finally, the variant algorithm is used to reconstruct sparse signals in compressed sensing (CS) problems.


Introduction
Firstly, we consider the nonlinear minimization problem min x2R n f ðxÞ; ð1Þ in which f is an n-variables smooth function from R n !R. The following notations will be employed throughout this work: rf(x k ) = g(x k ) = g k , and f(x k+1 ) = f k+1 .It has become a trend in nonlinear optimization to solve (1) by employing the conjugate gradient (CG) method, which is an algorithm that requires minimum memory to implement.Given an arbitrary x 0 2 R n , the method takes the following form: where x k stands for the k th iterate, ϑ k > 0 is a steplength that is often obtained by applying an appropriate line search procedure [1][2][3][4][5][6].The CG direction d k given by in which β k has a significant impact on the robustness of different CG algorithms.Over the decades, various formulations of the parameter have been proposed.The classical β k updates are defined as follows: in which k�k stands for the ℓ 2 − norm of vectors.For more β k updates, the reader should see [1,2,14,15].A descent condition (DC) with respect to a CG direction is satisfied if and also satisfied the sufficient DC if In most circumstances, proving the sufficient DC ( 7) is enough to analyze global convergence for numerous CG algorithms.Now, observe that the parameters in ( 4) and ( 5) have the common numerator kg k+1 k 2 and g T kþ1 y k respectively.It was noted in [14] that the global convergence theorem of CG methods with β k defined in (4) depends only on the Lipschitz condition and is independent of the boundedness assumption.Also, CG schemes with the update parameters in (4) converge globally to the minimizer of objective function f in (1) when the step-size ϑ k is computed exactly, namely, ϑ k is determined as Despite this, the performance of methods using parameters in (4) is not encouraging.This is because they are prone to jamming, which occurs frequently when the algorithms perform multiple steps that do not approach the minimizer.On the other hand, compared to the schemes in (4), the CG methods with the parameters in (5) exhibit nice numerical performance because they possess an automatic restarting mechanism that avoids the jamming phenomena experienced by the methods with β k in (4).However, these methods converge globally only when f is a strongly convex function, namely, where A is a positive-definite and symmetric matrix, the exact line search is employed and the step size approaches zero whenever the Lipschitz condition is assumed.Now, it is important to note that the methods with β k defined by ( 4) and ( 5) are equivalent whenever f satisfies (8) and the exact line search is employed.The schemes also generate the pure conjugacy condition which ensures the minimization of f in at most n steps holds with A as defined earlier.Now, let y k be as defined in (1).Then utilizing the mean-value theorem for general nonlinear functions, there exists τ 2 (0, 1) such that holds.Considering (9) and (10), it is ideal to consider the conjugacy condition which, combined with (3) yields the b HS k update parameter presented in (5).Furthermore, both conjugacy condition (9) and (11) are implemented only with the exact line search condition which is not feasible in practical computations.Following this development, Perry employed the quasi-Newton condition to develop a modification of (11), which is used with inexact line searches namely As a consequence of ( 12), Dai and Liao [16] incorporated a parameter t 2 (0, 1) in ( 12) and presented another variant of (12) as As the Dai-Yuan(DY) method possesses shortcomings of the CG methods with β k in (4), attempts have been made over the years by researchers to develop its modifications for solving (1) with better performance and global convergence theorem [17][18][19][20].In line with this, Andrei [21] proposed a scaled DY-type method for solving (1) with sufficient descent and conjugacy conditions.Andrei [22] also proposed a hybrid method for solving (1), where the parameter β k of the scheme is a convex combination of the DY and HS parameters.Based on ideas of the quasi-Newton method by Li and Fukushima [23], Wei et al. [24], Zhang et al. [25], and Zhang [26] proposed two DY-type algorithms for solving (1).One of the schemes converges globally for nonconvex functions, while the second method exhibits the good performance properties it inherits from the HS method.For other DY-type methods proposed in the literature, see [27][28][29][30][31][32][33][34] and the references therein.
It has become a trend for nonlinear real-life problems, which are often modelled as (1), to also be formulated as the convex-constrained nonlinear problem in which C is a nonempty, closed convex subset of R n , and F : R n !R n is continuous and monotone, i.e., F satisfies the inequality Furthermore, due to the notion of optimality condition and nice properties of the CG methods for solving (1) and ( 14), i.e., when rf = F, is regarded as a gradient mapping of (1), different methods for solving (14) have been proposed in recent decades by scholars working on large-scale problems.Search directions of the methods are defined by where β k is determined by (4), (5) and their modifications.The problem expressed in (14) arises in applications such as the ℓ 1 − norm regularization problems in CP [35][36][37][38][39][40], the equilibrium problems in [41,42], etc.
In recent years, variants of DY algorithms [30] have been developed to solve the problem in (14).This includes the spectral DY projection approach by Liu and Li [43], where the algorithm was obtained by combining the classical DY parameter [9], the spectral gradient method [44], and the projection algorithm proposed in [45].In this scheme however, the C ¼ R n .Moreover, without considering the system's differentiability assumption, the authors were able to prove the global convergence of their method.Also, in [46], Liu and Li combined the multivariate spectral gradient method [47] with the DY scheme [9] to present a spectral DY-type method for solving problem (14).Motivated by the work in [43,48], Liu and Feng [49] presented a modified DY-type method for solving problem (14).The scheme's derivative-free structure makes it ideal for solving large-scale non-smooth problems.By employing the Lipschitz continuity assumption, the authors proved the global convergence of the new method.Inspired by the work in [49], Sani et al. [50] presented a DY-type method for solving (14), where the search direction of the scheme is obtained as a convex combination of the unmodified DY and CD parameters.The authors proved the global convergence of the scheme under mild assumptions.Only recently, Alhobaiti et al. [51] proposed two scaled DY-type algorithms for solving (14), where two different approaches were applied to compute the scaling parameter.The authors also showed that both methods satisfy (7) irrespective of the line search strategy used.
We outline the work's aims as: • To develop an efficient DY-type scheme for solving the problem (14), which not only inherits nice properties of the classical DY method but avoids its shortcomings.
• To present a DY-type algorithm that ensures sufficient DC.
• To develop a method with restart and scaling properties for accelerating the convergence.
• To demonstrate global convergence and analyse the proposed method's convergence rate.
• To demonstrate the application of the method in sparse signal reconstruction.
The paper is organised as follows: The motivation and details of the proposed method are discussed in the next section.Section three contains the convergence results.Section four presents a detailed discussion of numerical results.part five gives the scheme's applications, while part six draws conclusions.

Motivation and details of the method
We begin this section by stating the Cauchy-Schwartz inequality that will occasionally be recalled in the derivation as well as convergence analysis of the proposed algorithm.Let u and v be vectors in an inner product space, then where hu, vi stands for inner product between u and v.
As known theoretically, the sufficient DC (7) holds for all the classical CG methods when the objective function is convex quadratic and the exact line search procedure is employed.In actual computations, however, where inexact line searches are used, (7) does not hold in general for these methods.For example, by substituting the HS parameter in (3) and multiplying through by g T kþ1 , we obtain Clearly, (17) satisfies the sufficient DC if b HS k � 0 and g T kþ1 d k � 0. On the other hand, if b HS k � 0 and g T kþ1 d k > 0 the condition may not hold as the quantity b HS k g T kþ1 d k may become larger than −kg k+1 k 2 .To remedy this defect of the HS method, Dong et al. [52] made some modifications to the method that not only satisfies the sufficient DC but also inherits the nice attributes of the scheme.The authors proposed the following modified HS search direction: In (18), λ k+1 is given by where Motivated by this approach, Aminifard and Babaie-Kafaki [53] made an almost similar modification to the classical PRP method, which is also a scheme with the same built-in mechanism as the HS method but also fails to satisfy the condition (7) when inexact line searches are employed in general as demonstrated in the case of the HS method.The search direction put forward by the authors in [53] for which (7) holds, while retaining nice attributes of the unmodified PRP method is defined as follows: where Inspired by ( 18), (19), and the classical DY method, we propose the following DY-type search directions: in which and t is a positive parameter in the interval (0, 2).Also, The proposed by Aminifard and Babaie-Kafaki [53].The direction (20) has both the scaling and restarting advantages that made it satisfy the sufficient descent condition without the aid of any line search procedure, and the accelerated convergence rate both numerically and theoretically.From (24) and monotonicity of F, we obtain where in the second equality we used the definition of s k = z k − x k = −ϑ k d k , and the inequality follows from the monotonicity of F. The steps of the spectral-modified Dai-Yuan algorithm are as follows: Algorithm 1 (Spectral Modified Dai-Yuan Method) holds.
4: Determine d k+1 by the formula (20).5: We repeat the procedure from 1 by setting k = k + 1 until the stopping condition is satisfied.

Convergence results
This will give the convergence rate and the global convergence for the spectral-modified Dai-Yuan algorithm.Now, we start with the following helpful assumptions: Before we proceed, it is required to state an important property of the projection operator which will be recalled later.
Lemma 1 Given that C is as defined in (14), then Lemma 2 The sequence of search directions for the spectral-modified Dai-Yuan algorithm satisfies the inequalities where μ > 2, and t 2 (0, 2).Proof: It can be seen from ( 20) that for k = 0, Next, we analyze the two cases presented in (20) for k = 1, 2, . ...

First case: F
By pre-multiplying (20) by F k+1 , we get Second case: F T kþ1 d k � 0. Given that kF k+1 k 2 and d T k w k are both greater than zero, pre-multiplying the d k+1 for the second case by F k+1 , we obtain Therefore, utilizing (32) and (33), we obtain the inequality in (31), which establishes the proof.Now, from Cauchy Schwartz inequality ( 16) and (31), we obtain the first inequality.It can also be seen from that from (20) and for k = 0, d 0 = −F 0 which implies that kd 0 k = kF 0 k.Now, for k = 1, 2, . .., we analyze two cases.
(2) Assuming that Assumption 2 holds true for {x k } and {z k } obtained using the spectral-modified Dai-Yuan algorithm.Then α k > 0 satisfies Proof: Proof of ( 1) is omitted since it has been proven in [54].To show (2), we assume that Algorithm 1 terminates at some point x k .If so, it follows that F(z k ) = 0 or F(x k ) = 0.If F(x k ) 6 ¼ 0, then the direction d k 6 ¼ 0 as well.We will demonstrate that (26) always stops within finite iterations.If F(x k ) 6 ¼ 0, then d k 6 ¼ 0 follows from (31).We now demonstrate that the line search (26) always ends within finite iterations.Suppose ϑ k = 1 in (26), then (26) holds, otherwise, � W k ¼ z À 1 W k will not satisfy (26), namely, From Assumption 2 and ( 26), we can write Therefore, we have with the second inequality obtained from (36).Lemma 4 Assume that Assumptions 1 and 2 are true.Therefore fx k À � xg is convergent for any arbitrary solution � x in � C and consequently Proof: To begin the proof, we start with the boundedness of {x k } and {z k }.From (15) and z k , we obtain From (15) and � x 2 � C, we get Utilizing ( 30), ( 27), ( 28), ( 39) and ( 40), we have ð41Þ So that we obtain x kg is a decreasing sequence which shows it is bounded and {x k } is also bounded.From the boundedness of {x k } and continuity of F, we have that there exists a constant ξ 1 such that for all k � 0, Also, from ( 36) and ( 42), we obtain Hence, there exists a constant ξ 2 such that k d kþ1 k� x 2 ; where 15), ( 16), ( 39) and (42), we have which further implies that By setting x 3 :¼ x 1 þsx 1 s , the boundedness of {z k } is established.Also, the continuity of F, implies that there exists a constant � c such that This with (41) yields Considering that fk x k À � x kg is convergent and {F(z k )} is bounded, we have from (43) that Theorem 5 Considering the Assumptions 1 and 2, the sequence {x k } converges.Proof: Utilizing ( 37) and ( 38), we see that when 0 � ϑkd k k � ϑ k kd k k !0. In addition, lim k!1 kd k k = 0. From (31), it yields the relation This indicates that lim k!1 kF k k = 0. From (38) and boundedness of {x k }, we have that a cluster point of {x k } exists.Suppose x represents a cluster point of fx k g � � C, where K � f0; 1; 2; . ..g is an indexing set for which lim k !1;k2K Then, by continuity of F, we get which implies that x is a solution of ( 14), i.e., x 2 � C. Furthermore, fx k À � xg is convergent from Lemma 3.3.Therefore, setting � x ¼ x, we obtain By replacing � x with x in (41), we have From ( 29), ( 36), (45) and since for a solution Next, from (31) and Cauchy Schwartz inequality ( 16) by assuming c = 1 to have Thus, since xk 2 � C, using ( 44), ( 45), ( 46), ( 47), (48), and (37), we obtain Since τ, σ, ϑ 2 (0, 1), and

Numerical results
In this part, we give the numerical findings for the spectral-modified Dai-Yuan algorithm to demonstrate its usefulness, which we identify as SDYM.The proposed algorithm is evaluated via some current spectral-like methods from the literature: • A method for solving nonlinear monotone equations using scaled gradient projections (SCGP) [55].
We choose to compare the SDYM algorithm with these algorithms because they are also derivative-free algorithms with good convergence properties and are numerically efficient, for more details on the compared algorithms, we refer readers to the highlighted references.The four algorithms are written in Matlab R2015 on a PC with the following specifications (4 GB RAM, 2.30 GHZ CPU).The methods are also implemented using the line search given in (26), with the SCGP, MDY, and SRMIL parameters applied as they appear in each of the papers.Parameters for SDYM are given as z = 0.4, σ = 10 −3 , ψ = 1.9, μ = 2.5, t = 0.6.Each Matlab program completes when kF(x k )k � 10 −10 or kF(z k )k � 10 −10 or 1000 iterations are reached.Six examples of the operator F, which are presented as Example 1 to Example 6 are chosen for the experiments with dimensions 10 3 , 10 4 , 5 × 10 4 and the following initial starting points: The numerical findings using six examples of monotone operator equations are given by considering C as a convex-constrained set.The examples are as follows: Example 1.This is a Nonsmooth Function that can be found in [58].
Example 2. This is a Nonsmooth Function that can be found in [59].
Example 3.This is a Penalty Function 1 that can be found in [50].
Example 5.This is an artificial problem that can be found in [60].
Example 6.This is an artificial problem that can be found in [61].
The numerical findings carried out for the considered algorithms are given by Tables 1-3, this reads as follows as to what each column's labels mean: "VAR" and "PN" stand for the dimension and number of the example solved, "IG" and "NOI" are the initial guess and number of iterations respectively."FV" and "Ptime" denote function values and the processing time recorded in seconds, while "Norm" and "***" stand for the termination point and failure of an algorithm after reaching the maximum iteration (1000).
From Tables 1-3 we observe that SDYM algorithms are more effective compared to the other three algorithms since they successfully solved all six examples.To further explain the performance of each algorithm, the performance tool developed in [62]    In addition to the above analysis, the curve representing the SDYM algorithm remains at the top of the curves representing the SCGP, MDY and SRMIL algorithms.Based on this discussion, it can be concluded that the SDYM algorithm is promising for solving (14).

Sparse signal reconstruction
Here, we discuss the application of the SDYM algorithm in solving signal reconstruction problems.To that end and to further test its performance, the SDYM algorithm is compared with HTTCGP [40] and PCG [38] algorithms.Sparse signal reconstruction refers to a technique of reconstructing a disturbed signal in a sparse scenario.This procedure comes up often in practical disciplines, which include astrophysical signals, machine learning, wireless sensor networks, video coding, medical imaging, radar, and compressive imaging [63].The target of the technique is obtaining sparse solutions to the under-determined linear system Sx ¼ b, in which S 2 R k�n ðk � nÞ is a sampling matrix, x denotes the original sparse signal and b 2 R k stands for an observed value.Hence, in order to reconstruct x from the system Sx ¼ b, the ℓ 1 norm regularization problem is solved where γ > 0 is a parameter.In solving (49), the authors in [64] reformulated it as a convex quadratic model by first splitting a vector x 2 R n into two parts, i.e., where v i = (x i ) + , w i = (−x i ) + , 8i = 1, 2, . .., n and (.) + = max{0, x}.By this formulation, we can By setting the variables: A ; (50) can be reformulated as as a standard bound-constrained quadratic programming problem.This problem was restarted as which clearly represents the convex quadratic programming problem, see [65].Since it was proven in [65,66] that F is monotone and Lipschitz continuous, then (49) can equivalently be represented as (14), which can be solved with the SDYM algorithm.

Experiments and reported results
Here, the SDYM algorithm is applied from some k observations to reconstruct a sparse signal of length n with k � n.For the experiments, the HTTCGP and PCG algorithms were employed to compare the performances of SDYM since they are used to solve signal problems.The PCG and HTTCGP algorithms were run with the same values as used in the original papers.Apart from z = 0.4, the SDYM parameter values stay the same as in the previous experiment.The quality of restoration is tested in the experiments by the mean square error (MSE), which is defined by where x is the reconstructed signal and � x is the actual one.This definition shows that a scheme with lower MSE yields better-quality reconstructed signals.We used n = 2 12 as the size of the signal with k = 2 10 , and 2 6 as the randomly nonzero elements contained in the original signal.In Matlab, the Gaussian matrix S is produced using the function randn(k, n).Furthermore, noise, particularly b ¼ Sx À Z interferes with the measurement b, where η is the Gaussian noise distributed as N(0, 10 −4 ).To initiate the experiment, we used the measurement image, i.e., x 0 ¼ S T b, which terminates if the inequality holds, where the function f k ¼ 1 2 k Sx k À b k 2 2 þgk x k k 1 , with the parameter γ > 0.
The actual sparse signal, its measurement, and reconstructed version by the three algorithms are displayed in Fig 4 .It can be observed from plots 5-8 in descending order that the three algorithms were able to reconstruct the signal almost exactly from the measurement, with SDYM achieving that much faster.In addition, Figs 5-8 consist of four graphs that depict the convergence performance of the three methods in terms of mean square error (MSE), function values (ObjFE), processing time and number of iterations metrics.Thus, as observed from Figs 5-8, the rates of descent of MSE and ObjFE for SDYM are much faster than HTTCGP and PCG methods.In addition, the experiment was repeated times, and the results are given in Table 4.The results displayed HTTCGP and PCG in three of the four metrics considered.These results clearly showed that the SDYM algorithm is effective for decoding sparse signals in CP.

Conclusion
In this paper, we presented a spectral-type Dai-Yuan algorithm for a constrained NCMS and reconstructed sparse signals in CP.The new algorithm fulfils the sufficient DC irrespective of the line search technique applied.The proposed algorithm's global convergence was shown using a monotone line search approach and certain fundamental assumptions.Furthermore, a comparison of the proposed approach to three efficient methods shows that it is efficient.
parameter b MDY1 k and b MDY2 k are the improved Dai-Yuan CG parameters and are both constructed in a similar with the b DHS k proposed by Dong et al. [52] and the b MPRP k by Dolan and More ´to plot Figs 1-3, with respect to our metrics performance, namely; function values, iterations, and processing time for each algorithm.The y-axis of each figure represents the examples solved with a minimum value of the aforementioned metrics; the right side of each figure represents the percentage of the examples solved by each algorithm successfully, while the algorithms that maintain the topmost curve, indicate the scheme that solves the majority of